Thanks to the development of 2D keypoint detectors, monocular 3D human pose estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable improvements. Still, monocular 3D HPE is a challenging problem due to the inherent depth ambiguities and occlusions. To handle this problem, many previous works exploit temporal information to mitigate such difficulties. However, there are many real-world applications where frame sequences are not accessible. This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. Rather than exploiting temporal information, we alleviate the depth ambiguity by generating multiple 3D pose candidates which can be mapped to an identical 2D keypoint. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By considering the correlation between human joints by replacing the conventional denoising U-Net with graph convolutional network, our approach accomplishes further performance improvements. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to prove the efficacy of the proposed method, and they confirm that our model outperforms state-of-the-art multi-hypothesis 3D HPE methods.
translated by 谷歌翻译
How can we accurately identify new memory workloads while classifying known memory workloads? Verifying DRAM (Dynamic Random Access Memory) using various workloads is an important task to guarantee the quality of DRAM. A crucial component in the process is open-set recognition which aims to detect new workloads not seen in the training phase. Despite its importance, however, existing open-set recognition methods are unsatisfactory in terms of accuracy since they fail to exploit the characteristics of workload sequences. In this paper, we propose Acorn, an accurate open-set recognition method capturing the characteristics of workload sequences. Acorn extracts two types of feature vectors to capture sequential patterns and spatial locality patterns in memory access. Acorn then uses the feature vectors to accurately classify a subsequence into one of the known classes or identify it as the unknown class. Experiments show that Acorn achieves state-of-the-art accuracy, giving up to 37% points higher unknown class detection accuracy while achieving comparable known class classification accuracy than existing methods.
translated by 谷歌翻译
Inspired by the impressive performance of recent face image editing methods, several studies have been naturally proposed to extend these methods to the face video editing task. One of the main challenges here is temporal consistency among edited frames, which is still unresolved. To this end, we propose a novel face video editing framework based on diffusion autoencoders that can successfully extract the decomposed features - for the first time as a face video editing model - of identity and motion from a given video. This modeling allows us to edit the video by simply manipulating the temporally invariant feature to the desired direction for the consistency. Another unique strength of our model is that, since our model is based on diffusion models, it can satisfy both reconstruction and edit capabilities at the same time, and is robust to corner cases in wild face videos (e.g. occluded faces) unlike the existing GAN-based methods.
translated by 谷歌翻译
Generalized Labeled Multi-Bernoulli (GLMB) densities arise in a host of multi-object system applications analogous to Gaussians in single-object filtering. However, computing the GLMB filtering density requires solving NP-hard problems. To alleviate this computational bottleneck, we develop a linear complexity Gibbs sampling framework for GLMB density computation. Specifically, we propose a tempered Gibbs sampler that exploits the structure of the GLMB filtering density to achieve an $\mathcal{O}(T(P+M))$ complexity, where $T$ is the number of iterations of the algorithm, $P$ and $M$ are the number hypothesized objects and measurements. This innovation enables an $\mathcal{O}(T(P+M+\log(T))+PM)$ complexity implementation of the GLMB filter. Convergence of the proposed Gibbs sampler is established and numerical studies are presented to validate the proposed GLMB filter implementation.
translated by 谷歌翻译
Traversability estimation for mobile robots in off-road environments requires more than conventional semantic segmentation used in constrained environments like on-road conditions. Recently, approaches to learning a traversability estimation from past driving experiences in a self-supervised manner are arising as they can significantly reduce human labeling costs and labeling errors. However, the self-supervised data only provide supervision for the actually traversed regions, inducing epistemic uncertainty according to the scarcity of negative information. Negative data are rarely harvested as the system can be severely damaged while logging the data. To mitigate the uncertainty, we introduce a deep metric learning-based method to incorporate unlabeled data with a few positive and negative prototypes in order to leverage the uncertainty, which jointly learns using semantic segmentation and traversability regression. To firmly evaluate the proposed framework, we introduce a new evaluation metric that comprehensively evaluates the segmentation and regression. Additionally, we construct a driving dataset `Dtrail' in off-road environments with a mobile robot platform, which is composed of a wide variety of negative data. We examine our method on Dtrail as well as the publicly available SemanticKITTI dataset.
translated by 谷歌翻译
This work presents six structural quality metrics that can measure the quality of knowledge graphs and analyzes five cross-domain knowledge graphs on the web (Wikidata, DBpedia, YAGO, Google Knowledge Graph, Freebase) as well as 'Raftel', Naver's integrated knowledge graph. The 'Good Knowledge Graph' should define detailed classes and properties in its ontology so that knowledge in the real world can be expressed abundantly. Also, instances and RDF triples should use the classes and properties actively. Therefore, we tried to examine the internal quality of knowledge graphs numerically by focusing on the structure of the ontology, which is the schema of knowledge graphs, and the degree of use thereof. As a result of the analysis, it was possible to find the characteristics of a knowledge graph that could not be known only by scale-related indicators such as the number of classes and properties.
translated by 谷歌翻译
Single-image 3D human reconstruction aims to reconstruct the 3D textured surface of the human body given a single image. While implicit function-based methods recently achieved reasonable reconstruction performance, they still bear limitations showing degraded quality in both surface geometry and texture from an unobserved view. In response, to generate a realistic textured surface, we propose ReFu, a coarse-to-fine approach that refines the projected backside view image and fuses the refined image to predict the final human body. To suppress the diffused occupancy that causes noise in projection images and reconstructed meshes, we propose to train occupancy probability by simultaneously utilizing 2D and 3D supervisions with occupancy-based volume rendering. We also introduce a refinement architecture that generates detail-preserving backside-view images with front-to-back warping. Extensive experiments demonstrate that our method achieves state-of-the-art performance in 3D human reconstruction from a single image, showing enhanced geometry and texture quality from an unobserved view.
translated by 谷歌翻译
Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-world urban autonomous multi-vehicle races were held in March and November of 2021, respectively. During the competition, multiple vehicles were involved simultaneously, which required maneuvers such as overtaking low-speed vehicles, negotiating intersections, and obeying traffic laws. In this study, we introduce a fully autonomous driving software stack to deploy a competitive driving model, which enabled us to win the urban autonomous multi-vehicle races. We evaluate module-based systems such as navigation, perception, and planning in real and virtual environments. Additionally, an analysis of traffic is performed after collecting multiple vehicle position data over communication to gain additional insight into a multi-agent autonomous driving scenario. Finally, we propose a method for analyzing traffic in order to compare the spatial distribution of multiple autonomous vehicles. We study the similarity distribution between each team's driving log data to determine the impact of competitive autonomous driving on the traffic environment.
translated by 谷歌翻译
为了在非结构化环境中安全,成功地导航自动驾驶汽车,地形的穿越性应根据车辆的驾驶能力而变化。实际的驾驶经验可以以自我监督的方式使用来学习特定的轨迹。但是,现有的学习自我监督的方法对于学习各种车辆的遍历性并不可扩展。在这项工作中,我们引入了一个可扩展的框架,用于学习自我监督的遍历性,该框架可以直接从车辆 - 泰林的互动中学习遍历性,而无需任何人类监督。我们训练一个神经网络,该神经网络可以预测车辆从3D点云中经历的本体感受体验。使用一种新颖的PU学习方法,网络同时确定了不可转化的区域,其中估计可以过度自信。通过从模拟和现实世界中收集的各种车辆的驾驶数据,我们表明我们的框架能够学习各种车辆的自我监督的越野性。通过将我们的框架与模型预测控制器整合在一起,我们证明了估计的遍历性会导致有效的导航,从而根据车辆的驾驶特性实现了不同的操作。此外,实验结果验证了我们方法识别和避免不可转化区域的能力。
translated by 谷歌翻译
随着预培训的语言模型变得更加要求资源,因此资源丰富的语言(例如英语和资源筛选)语言之间的不平等正在恶化。这可以归因于以下事实:每种语言中的可用培训数据量都遵循幂律分布,并且大多数语言都属于分布的长尾巴。一些研究领域试图缓解这个问题。例如,在跨语言转移学习和多语言培训中,目标是通过从资源丰富的语言中获得的知识使长尾语言受益。尽管成功,但现有工作主要集中于尝试尽可能多的语言。结果,有针对性的深入分析主要不存在。在这项研究中,我们专注于单一的低资源语言,并使用跨语性培训(XPT)进行广泛的评估和探测实验。为了使转移方案具有挑战性,我们选择韩语作为目标语言,因为它是一种孤立的语言,因此与英语几乎没有类型的分类。结果表明,XPT不仅优于表现或与单语模型相当,该模型训练有大小的数据,而且在传输过程中也很高。
translated by 谷歌翻译